An Examination of Hierarchical Bayesian Dynamic Structural Equation Models in Stan

MathPsych 2023

Jean-Paul Snijder1, Valentin Pratz1, and Anna-Lena Schubert2

1University of Heidelberg, 2University of Mainz

1 Dynamic Structural Equation Modeling (DSEM)

Introduction

  • Increasing need for models that capture dynamic changes of within-person variability
  • “short-term, within-person variability is noise should be eliminated” - old view
  • Technological advancements are increasing availability of Intensive Longitudinal Data (ILD) from:
    • Experience Sampling Methods (ESM, EMA, AA)
    • Electro-EncephaloGram (EEG)
    • Wearables
  • ILD are densely spaced repeated measures data collected from large samples
  • Computational models are adapted and developed to match this trend

Dynamic Structural Equation Modeling

  • DSEM extends cross-classified models by regression of variables on previous (lagged) time-points
  • Combines 1
    • Time-series modeling
      • allows lagged relationships
    • Multilevel modeling
      • allows modeling of nested data structures
    • Structural equation modeling
      • allows latent variable/path analysis

DSEM in Mplus

Pros

  • Widely used
  • online user and program support
  • Considered user-friendly
  • Low computational time
  • Gibbs sampler with conjugate priors (Normal, Inverse Wishart)

Cons

  • Not fully customizable
  • currently doesn’t support some model extensions and specifications
  • limited prior options and access to sampler settings
    • i.e., no LKJ distribution
  • Limited options for missing data
  • License costs money

DSEM in Stan

Pros

  • Free
  • Fully customizable
  • Open Code & Reproducible Science
  • Online community support
  • Efficient general-purpose MCMC sampling

Cons

  • Programming can pose a barrier
    • Fully code-based
    • No GUI
  • Higher computational time
    • but reasonable (minutes to hours)
    • not optimized for a specific model family

Our project

  • Stan tutorial using DSEM framework as example
    1. Introducing DSEM and improving the accessibility
    2. Provide a starting point to get started with Stan
  • 6 model archetypes1
    1. Bivariate, Single Case
    2. Bivariate, Multilevel
    3. Model 2 + predictor variable
    4. Model 2 + latent variable
    5. Model 3 + outcome variable
    6. Model 4 + mediation
  • For each archetype in Stan:
    1. Simple: tutorial model
    2. Reparam: reparameterized model
    3. Full: missing data model

2 Short Tutorial

M2: Bivariate Multilevel Model

  • Model 2: two variables + multilevel
    • Stress: Perceived Stress
    • Sleep: Last night sleep in number of hours

M2: Bivariate Multilevel Model

  • Model 2: two variables + multilevel
    • Stress: Perceived Stress
    • Sleep: Last night sleep in number of hours

  • Within- & between-person decomposition
    • Between: time-insensitive mean of subject
    • Within: time-sensitive deviation from that mean
  • Allows for specifying time-dynamics in within-person model

M2: Within-person Model I

  • The decomposed within-person variables are the start of the within-person model

M2: Within-person Model II

  • Stressi,t-1(w) and Sleepi,t-1(w) are lag(1) variables
  • E.g., if t = observation 9 \(\Rightarrow\) t-1 = observation 8

M2: Within-person Model III

  • Relationships & Parameters:
    • Regression:
      • \(\beta\)YX = X regressed on Y
    • Auto-regression:
      • \(\Phi\)X,i = auto-regressive parameter X
      • \(\Phi\)Y,i = auto-regressive parameter Y
      • \(\Phi\)XY,i = cross-regressive parameter Yi,t-1 onto Xi,t
    • Residual variances:
      • \(\Psi\)2X,i and \(\Psi\)2Y,i

M2: Between-person Model

3 Stan

Within-level model I

\[\begin{align} X_{i,t}^{(w)} &= \Phi_{X,i} X_{i,t-1}^{(w)} + \beta_{YX,i} Y_{i,t}^{(w)} + \zeta_{Xi,t} \\ Y_{i,t}^{(w)} &= \Phi_{Y,i} Y_{i,t-1}^{(w)} + \Phi_{XY,i} X_{i,t-1}^{(w)} + \zeta_{Yi,t} \\ \zeta_{Xi,t} &\sim \mathcal{N}(0, \Psi_{X,i}^2), \, \zeta_{Yi,t} \sim \mathcal{N}(0, \Psi_{Y,i}^2) \end{align}\]

Within-level model II

real zeta_Xit = X_w[t] - (phi_X * X_w[t-1] + beta_YX * Y_w[t]);
real zeta_Yit = Y_w[t] - (phi_Y * Y_w[t-1] + phi_XY * X_w[t-1]);
zeta_Xit ~ normal(0, psi_X);
zeta_Yit ~ normal(0, psi_Y);

\[\begin{align} X_{i,t}^{(w)} &= \Phi_{X,i} X_{i,t-1}^{(w)} + \beta_{YX,i} Y_{i,t}^{(w)} + \zeta_{Xi,t} \\ Y_{i,t}^{(w)} &= \Phi_{Y,i} Y_{i,t-1}^{(w)} + \Phi_{XY,i} X_{i,t-1}^{(w)} + \zeta_{Yi,t} \\ \zeta_{Xi,t} &\sim \mathcal{N}(0, \Psi_{X,i}^2), \, \zeta_{Yi,t} \sim \mathcal{N}(0, \Psi_{Y,i}^2) \end{align}\]

Between-level model I

  • Using latent means and random intercepts/effects

\[\begin{align} X_i^{(b)} &= \gamma_1 + u_{i1} \\ Y_i^{(b)} &= \gamma_2 + u_{i2} \\ \Phi_{Xi} &= \gamma_3 + u_{i3} \\ \Phi_{Yi} &= \gamma_4 + u_{i4} \\ \Phi_{XYi} &= \gamma_5 + u_{i5} \\ \beta_{YXi} &= \gamma_6 + u_{i6} \\ \log\Psi_{Xi}^2 &= \gamma_7 + u_{i7} \\ \log\Psi_{Yi}^2 &= \gamma_8 + u_{i8} \\ \end{align}\]

\[\begin{align} \boldsymbol{u}\sim\text{MVNormal}(\boldsymbol{0}, \boldsymbol{\Omega}) \end{align}\]

Between-level model II

  • Using latent means and random intercepts/effects


real mu_X = gamma[1] + u[i,1];
real mu_Y = gamma[2] + u[i,2];

real phi_X = gamma[3] + u[i,3];
real phi_Y = gamma[4] + u[i,4];
real phi_XY = gamma[5] + u[i,5];
real beta_YX = gamma[6] + u[i,6];

real psi_X = sqrt(exp(gamma[7] + u[i,7]));
real psi_Y = sqrt(exp(gamma[8] + u[i,8]));

u[i] ~ multi_normal(rep_vector(0, 8), Omega);

\[\begin{align} X_i^{(b)} &= \gamma_1 + u_{i1} \\ Y_i^{(b)} &= \gamma_2 + u_{i2} \\ \Phi_{Xi} &= \gamma_3 + u_{i3} \\ \Phi_{Yi} &= \gamma_4 + u_{i4} \\ \Phi_{XYi} &= \gamma_5 + u_{i5} \\ \beta_{YXi} &= \gamma_6 + u_{i6} \\ \log\Psi_{Xi}^2 &= \gamma_7 + u_{i7} \\ \log\Psi_{Yi}^2 &= \gamma_8 + u_{i8} \\ \end{align}\]

\[\begin{align} \boldsymbol{u}\sim\text{MVNormal}(\boldsymbol{0}, \boldsymbol{\Omega}) \end{align}\]

Optimization: Reparameterization

  • Improves convergence, can speed up sampling
  • Classical example: \(y \sim \mathcal{N}(\mu, \sigma^2) \Leftrightarrow y = \mu + \sigma\cdot\tilde y\text{ with }\tilde y\sim\mathcal{N}(0, 1)\)

Handling missing data: Imputation

  • Missing data is unknown
  • Parameters are unknown

\(\rightarrow\) treat missing data like parameters

  • Preserves uncertainty (unlike mean imputation etc.)

4 Results

Results with simulated data

  • 67 subjects
  • 90 observations
  • for missing data model: 5% missingness
  • no model misspecification
  • 500 warmup/3000 sampling iterations
  • 4 chains, 16 cores

Model convergence

Model 2, simulated data convergence.

Parameter recovery

Model 2, simulated data. Errorbars: 95% CI.

Results with real data


  • COGITO data 1
    • 67 subjects
    • 90 daily observations
      • Stress
      • Sleep
    • missing at random: < 1% of observations
    • true model unknown
    • 500/3000 iterations
    • 4 chains, 16 cores

Model convergence

Model 2, COGITO data convergence.

Parameter estimation

Model 2, COGITO data parameters. Errorbars: 95% CI.

Open questions

  • Model comparison/selection
  • Model criticism

5 Discussion

Future

  • current: Simulations
    • relevant parameter ranges
    • prior calibration
  • near: Standardized estimates
  • far: R Package with Stan as back-end

Thanks to

  • Ellen Hamaker for instructional material on DSEM
  • Mauricio Garnier-Villarreal (blavaan) for sharing his Stan knowledge online
  • The Stan community for educational material on reparameterization and other tricks

Thank you

Questions?

Github repo with Model 2 in Stan + presentation

References

Hamaker, E. L., Asparouhov, T., & Muthén, B. (2023). Dynamic Structural Equation Modeling as a Combination of Time Series Modeling, Multilevel Modeling, and Structural Equation Modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (2nd ed., pp. 576–597). New York: Guilford Press.
Schmiedek, F., Lövdén, M., & Lindenberger, U. (2010). Hundred Days of Cognitive Training Enhance Broad Cognitive Abilities in Adulthood: Findings from the COGITO Study. Frontiers in Aging Neuroscience, 2, 27. https://doi.org/10.3389/fnagi.2010.00027

DSEM software alternatives

  • dlsem in R:
    • Uses frequentist inference
  • ctsem in R:
    • Slow for full Bayesian estimation
    • Oriented towards continuous time systems
      • but discrete can be used
    • Less user-friendly
  • SAS, Stata, OpenMX
    • Offer very little support for DSEM
    • No readily available models